Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
2.
bioRxiv ; 2023 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-38014266

RESUMEN

Allelic variability in the adaptive immune receptor loci, which harbor the gene segments that encode B cell and T cell receptors (BCR/TCR), has been shown to be of critical importance for immune responses to pathogens and vaccines. In recent years, B cell and T cell receptor repertoire sequencing (Rep-Seq) has become widespread in immunology research making it the most readily available source of information about allelic diversity in immunoglobulin (IG) and T cell receptor (TR) loci in different populations. Here we present a novel algorithm for extra-sensitive and specific variable (V) and joining (J) gene allele inference and genotyping allowing reconstruction of individual high-quality gene segment libraries. The approach can be applied for inferring allelic variants from peripheral blood lymphocyte BCR and TCR repertoire sequencing data, including hypermutated isotype-switched BCR sequences, thus allowing high-throughput genotyping and novel allele discovery from a wide variety of existing datasets. The developed algorithm is a part of the MiXCR software ( https://mixcr.com ) and can be incorporated into any pipeline utilizing upstream processing with MiXCR. We demonstrate the accuracy of this approach using Rep-Seq paired with long-read genomic sequencing data, comparing it to a widely used algorithm, TIgGER. We applied the algorithm to a large set of IG heavy chain (IGH) Rep-Seq data from 450 donors of ancestrally diverse population groups, and to the largest reported full-length TCR alpha and beta chain (TRA; TRB) Rep-Seq dataset, representing 134 individuals. This allowed us to assess the genetic diversity of genes within the IGH, TRA and TRB loci in different populations and demonstrate the connection between antibody repertoire gene usage and the number of allelic variants present in the population. Finally we established a database of allelic variants of V and J genes inferred from Rep-Seq data and their population frequencies with free public access at https://vdj.online .

3.
Nucleic Acids Res ; 51(16): e86, 2023 09 08.
Artículo en Inglés | MEDLINE | ID: mdl-37548401

RESUMEN

In adaptive immune receptor repertoire analysis, determining the germline variable (V) allele associated with each T- and B-cell receptor sequence is a crucial step. This process is highly impacted by allele annotations. Aligning sequences, assigning them to specific germline alleles, and inferring individual genotypes are challenging when the repertoire is highly mutated, or sequence reads do not cover the whole V region. Here, we propose an alternative naming scheme for the V alleles, as well as a novel method to infer individual genotypes. We demonstrate the strengths of the two by comparing their outcomes to other genotype inference methods. We validate the genotype approach with independent genomic long-read data. The naming scheme is compatible with current annotation tools and pipelines. Analysis results can be converted from the proposed naming scheme to the nomenclature determined by the International Union of Immunological Societies (IUIS). Both the naming scheme and the genotype procedure are implemented in a freely available R package (PIgLET https://bitbucket.org/yaarilab/piglet). To allow researchers to further explore the approach on real data and to adapt it for their uses, we also created an interactive website (https://yaarilab.github.io/IGHV_reference_book).


Asunto(s)
Genómica , Cadenas Pesadas de Inmunoglobulina , Receptores de Antígenos de Linfocitos B , Alelos , Genotipo , Receptores de Antígenos de Linfocitos B/genética , Cadenas Pesadas de Inmunoglobulina/genética
4.
Nat Commun ; 14(1): 4419, 2023 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-37479682

RESUMEN

Variation in the antibody response has been linked to differential outcomes in disease, and suboptimal vaccine and therapeutic responsiveness, the determinants of which have not been fully elucidated. Countering models that presume antibodies are generated largely by stochastic processes, we demonstrate that polymorphisms within the immunoglobulin heavy chain locus (IGH) impact the naive and antigen-experienced antibody repertoire, indicating that genetics predisposes individuals to mount qualitatively and quantitatively different antibody responses. We pair recently developed long-read genomic sequencing methods with antibody repertoire profiling to comprehensively resolve IGH genetic variation, including novel structural variants, single nucleotide variants, and genes and alleles. We show that IGH germline variants determine the presence and frequency of antibody genes in the expressed repertoire, including those enriched in functional elements linked to V(D)J recombination, and overlapping disease-associated variants. These results illuminate the power of leveraging IGH genetics to better understand the regulation, function, and dynamics of the antibody response in disease.


Asunto(s)
Genes de las Cadenas Pesadas de las Inmunoglobulinas , Genes de Inmunoglobulinas , Humanos , Genes de las Cadenas Pesadas de las Inmunoglobulinas/genética , Alelos , Mutación de Línea Germinal , Cadenas Pesadas de Inmunoglobulina/genética
5.
J Immunol ; 210(10): 1607-1619, 2023 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-37027017

RESUMEN

Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using short-read sequencing strategies resolve expressed Ab transcripts with limited resolution of the C region. In this article, we present the near-full-length AIRR-seq (FLAIRR-seq) method that uses targeted amplification by 5' RACE, combined with single-molecule, real-time sequencing to generate highly accurate (99.99%) human Ab H chain transcripts. FLAIRR-seq was benchmarked by comparing H chain V (IGHV), D (IGHD), and J (IGHJ) gene usage, complementarity-determining region 3 length, and somatic hypermutation to matched datasets generated with standard 5' RACE AIRR-seq using short-read sequencing and full-length isoform sequencing. Together, these data demonstrate robust FLAIRR-seq performance using RNA samples derived from PBMCs, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving H chain gene features not documented in IMGT at the time of submission. FLAIRR-seq data provide, for the first time, to our knowledge, simultaneous single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high-resolution identification of class switch recombination within a clonal lineage. In conjunction with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk-expressed Ab repertoires to date.


Asunto(s)
Regiones Determinantes de Complementariedad , Humanos , Regiones Determinantes de Complementariedad/genética , Secuencia de Bases
6.
Genes Immun ; 24(1): 21-31, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36539592

RESUMEN

Immunoglobulins (IGs), crucial components of the adaptive immune system, are encoded by three genomic loci. However, the complexity of the IG loci severely limits the effective use of short read sequencing, limiting our knowledge of population diversity in these loci. We leveraged existing long read whole-genome sequencing (WGS) data, fosmid technology, and IG targeted single-molecule, real-time (SMRT) long-read sequencing (IG-Cap) to create haplotype-resolved assemblies of the IG Lambda (IGL) locus from 6 ethnically diverse individuals. In addition, we generated 10 diploid assemblies of IGL from a diverse cohort of individuals utilizing IG-Cap. From these 16 individuals, we identified significant allelic diversity, including 36 novel IGLV alleles. In addition, we observed highly elevated single nucleotide variation (SNV) in IGLV genes relative to IGL intergenic and genomic background SNV density. By comparing SNV calls between our high quality assemblies and existing short read datasets from the same individuals, we show a high propensity for false-positives in the short read datasets. Finally, for the first time, we nucleotide-resolved common 5-10 Kb duplications in the IGLC region that contain functional IGLJ and IGLC genes. Together these data represent a significant advancement in our understanding of genetic variation and population diversity in the IGL locus.


Asunto(s)
Genes de Inmunoglobulinas , Cadenas lambda de Inmunoglobulina , Humanos , Cadenas lambda de Inmunoglobulina/genética , Genómica , Variación Genética , Nucleótidos
7.
Trends Immunol ; 44(1): 7-21, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36470826

RESUMEN

The recombination between immunoglobulin (IG) gene segments determines an individual's naïve antibody repertoire and, consequently, (auto)antigen recognition. Emerging evidence suggests that mammalian IG germline variation impacts humoral immune responses associated with vaccination, infection, and autoimmunity - from the molecular level of epitope specificity, up to profound changes in the architecture of antibody repertoires. These links between IG germline variants and immunophenotype raise the question on the evolutionary causes and consequences of diversity within IG loci. We discuss why the extreme diversity in IG loci remains a mystery, why resolving this is important for the design of more effective vaccines and therapeutics, and how recent evidence from multiple lines of inquiry may help us do so.


Asunto(s)
Genes de Inmunoglobulinas , Mutación de Línea Germinal , Animales , Humanos , Genes de Inmunoglobulinas/genética , Inmunidad Humoral/genética , Evolución Biológica , Células Germinativas , Mamíferos
8.
Front Immunol ; 14: 1330153, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38406579

RESUMEN

Introduction: Analysis of an individual's immunoglobulin (IG) gene repertoire requires the use of high-quality germline gene reference sets. When sets only contain alleles supported by strong evidence, AIRR sequencing (AIRR-seq) data analysis is more accurate and studies of the evolution of IG genes, their allelic variants and the expressed immune repertoire is therefore facilitated. Methods: The Adaptive Immune Receptor Repertoire Community (AIRR-C) IG Reference Sets have been developed by including only human IG heavy and light chain alleles that have been confirmed by evidence from multiple high-quality sources. To further improve AIRR-seq analysis, some alleles have been extended to deal with short 3' or 5' truncations that can lead them to be overlooked by alignment utilities. To avoid other challenges for analysis programs, exact paralogs (e.g. IGHV1-69*01 and IGHV1-69D*01) are only represented once in each set, though alternative sequence names are noted in accompanying metadata. Results and discussion: The Reference Sets include less than half the previously recognised IG alleles (e.g. just 198 IGHV sequences), and also include a number of novel alleles: 8 IGHV alleles, 2 IGKV alleles and 5 IGLV alleles. Despite their smaller sizes, erroneous calls were eliminated, and excellent coverage was achieved when a set of repertoires comprising over 4 million V(D)J rearrangements from 99 individuals were analyzed using the Sets. The version-tracked AIRR-C IG Reference Sets are freely available at the OGRDB website (https://ogrdb.airr-community.org/germline_sets/Human) and will be regularly updated to include newly observed and previously reported sequences that can be confirmed by new high-quality data.


Asunto(s)
Genes de Inmunoglobulinas , Inmunoglobulinas , Humanos , Inmunoglobulinas/genética , Alelos , Recombinación V(D)J/genética , Células Germinativas
9.
Am J Hum Genet ; 109(6): 1065-1076, 2022 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-35609568

RESUMEN

The human genome contains tens of thousands of large tandem repeats and hundreds of genes that show common and highly variable copy-number changes. Due to their large size and repetitive nature, these variable number tandem repeats (VNTRs) and multicopy genes are generally recalcitrant to standard genotyping approaches and, as a result, this class of variation is poorly characterized. However, several recent studies have demonstrated that copy-number variation of VNTRs can modify local gene expression, epigenetics, and human traits, indicating that many have a functional role. Here, using read depth from whole-genome sequencing to profile copy number, we report results of a phenome-wide association study (PheWAS) of VNTRs and multicopy genes in a discovery cohort of ∼35,000 samples, identifying 32 traits associated with copy number of 38 VNTRs and multicopy genes at 1% FDR. We replicated many of these signals in an independent cohort and observed that VNTRs showing trait associations were significantly enriched for expression QTLs with nearby genes, providing strong support for our results. Fine-mapping studies indicated that in the majority (∼90%) of cases, the VNTRs and multicopy genes we identified represent the causal variants underlying the observed associations. Furthermore, several lie in regions where prior SNV-based GWASs have failed to identify any significant associations with these traits. Our study indicates that copy number of VNTRs and multicopy genes contributes to diverse human traits and suggests that complex structural variants potentially explain some of the so-called "missing heritability" of SNV-based GWASs.


Asunto(s)
Variaciones en el Número de Copia de ADN , Repeticiones de Minisatélite , Variaciones en el Número de Copia de ADN/genética , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Repeticiones de Minisatélite/genética , Fenotipo
10.
Genome Med ; 14(1): 2, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34991709

RESUMEN

BACKGROUND: T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants. METHODS: To confront this challenge, AIRR-seq-based methods have recently been developed for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. However, this approach relies on complete coverage of the receptors' variable regions, whereas most T cell studies sequence a small fraction of that region. Here, we adapted a B cell pipeline for undocumented alleles, genotype, and haplotype inference for full and partial AIRR-seq TCR data sets. The pipeline also deals with gene assignment ambiguities, which is especially important in the analysis of data sets of partial sequences. RESULTS: From the full and partial AIRR-seq TCR data sets, we identified 39 undocumented polymorphisms in T cell receptor Beta V (TRBV) and 31 undocumented 5 ' UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found that a single nucleotide polymorphism differentiating between the two documented T cell receptor Beta D2 (TRBD2) alleles is strongly associated with dramatic changes in the expressed repertoire. CONCLUSIONS: We reveal a rich picture of germline variability and demonstrate how a single nucleotide polymorphism dramatically affects the composition of the whole repertoire. Our findings provide a basis for annotation of TCR repertoires for future basic and clinical studies.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Receptores de Antígenos de Linfocitos T alfa-beta , Alelos , Células Germinativas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Receptores de Antígenos de Linfocitos T/genética , Receptores de Antígenos de Linfocitos T alfa-beta/genética
11.
Cell Genom ; 2(12): 100228, 2022 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-36778049

RESUMEN

T cell receptors (TCRs) recognize peptide fragments presented by the major histocompatibility complex (MHC) and are critical to T cell-mediated immunity. Recent data have indicated that genetic diversity within TCR-encoding gene regions is underexplored, limiting understanding of the impact of TCR loci polymorphisms on TCR function in disease, even though TCR repertoire signatures (1) are heritable and (2) associate with disease phenotypes. To address this, we developed a targeted long-read sequencing approach to generate highly accurate haplotype resolved assemblies of the TCR beta (TRB) and alpha/delta (TRA/D) loci, facilitating the genotyping of all variant types, including structural variants. We validate our approach using two mother-father-child trios and 5 unrelated donors representing multiple populations. This resulted in improved genotyping accuracy and the discovery of 84 undocumented V, D, J, and C alleles, demonstrating the utility of this framework for improving our understanding of TCR diversity and function in disease.

12.
PLoS One ; 16(12): e0261374, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34898642

RESUMEN

Lymphoblastoid cell lines (LCLs) have been critical to establishing genetic resources for biomedical science. They have been used extensively to study human genetic diversity, genome function, and inform the development of tools and methodologies for augmenting disease genetics research. While the validity of variant callsets from LCLs has been demonstrated for most of the genome, previous work has shown that DNA extracted from LCLs is modified by V(D)J recombination within the immunoglobulin (IG) loci, regions that harbor antibody genes critical to immune system function. However, the impacts of V(D)J on short read sequencing data generated from LCLs has not been extensively investigated. In this study, we used LCL-derived short read sequencing data from the 1000 Genomes Project (n = 2,504) to identify signatures of V(D)J recombination. Our analyses revealed sample-level impacts of V(D)J recombination that varied depending on the degree of inferred monoclonality. We showed that V(D)J associated somatic deletions impacted genotyping accuracy, leading to adulterated population-level estimates of allele frequency and linkage disequilibrium. These findings illuminate limitations of using LCLs and short read data for building genetic resources in the IG loci, with implications for interpreting previous disease association studies in these regions.


Asunto(s)
Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/metabolismo , Recombinación V(D)J/genética , Alelos , Linfocitos B/inmunología , Línea Celular Tumoral/inmunología , Línea Celular Tumoral/metabolismo , Bases de Datos Genéticas , Frecuencia de los Genes/genética , Genes de Inmunoglobulinas/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Región Variable de Inmunoglobulina/genética , Inmunoglobulinas/genética , Inmunoglobulinas/inmunología , Análisis de Secuencia de ADN/métodos , Recombinación V(D)J/inmunología
14.
Am J Hum Genet ; 108(5): 809-824, 2021 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-33794196

RESUMEN

Variable number tandem repeats (VNTRs) are composed of large tandemly repeated motifs, many of which are highly polymorphic in copy number. However, because of their large size and repetitive nature, they remain poorly studied. To investigate the regulatory potential of VNTRs, we used read-depth data from Illumina whole-genome sequencing to perform association analysis between copy number of ∼70,000 VNTRs (motif size ≥ 10 bp) with both gene expression (404 samples in 48 tissues) and DNA methylation (235 samples in peripheral blood), identifying thousands of VNTRs that are associated with local gene expression (eVNTRs) and DNA methylation levels (mVNTRs). Using an independent cohort, we validated 73%-80% of signals observed in the two discovery cohorts, while allelic analysis of VNTR length and CpG methylation in 30 Oxford Nanopore genomes gave additional support for mVNTR loci, thus providing robust evidence to support that these represent genuine associations. Further, conditional analysis indicated that many eVNTRs and mVNTRs act as QTLs independently of other local variation. We also observed strong enrichments of eVNTRs and mVNTRs for regulatory features such as enhancers and promoters. Using the Human Genome Diversity Panel, we define sets of VNTRs that show highly divergent copy numbers among human populations and show that these are enriched for regulatory effects and preferentially associate with genes that have been linked with human phenotypes through GWASs. Our study provides strong evidence supporting functional variation at thousands of VNTRs and defines candidate sets of VNTRs, copy number variation of which potentially plays a role in numerous human phenotypes.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Metilación de ADN , Regulación de la Expresión Génica , Repeticiones de Minisatélite/genética , Sitios de Carácter Cuantitativo/genética , Adolescente , Adulto , Algoritmos , Niño , Preescolar , Cromosomas Humanos X/genética , Estudios de Cohortes , Islas de CpG/genética , Elementos de Facilitación Genéticos/genética , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Fenotipo , Regiones Promotoras Genéticas/genética , Adulto Joven
15.
Front Immunol ; 11: 2136, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33072076

RESUMEN

An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics.


Asunto(s)
Biología Computacional/métodos , Genes de Inmunoglobulinas , Variación Genética , Técnicas de Genotipaje , Haplotipos/genética , Cadenas Pesadas de Inmunoglobulina/genética , Polimorfismo Genético , Línea Celular , Presentación de Datos , Conjuntos de Datos como Asunto , Familia , Biblioteca de Genes , Variación Estructural del Genoma , Humanos , Anotación de Secuencia Molecular , Alineación de Secuencia , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico , Interfaz Usuario-Computador , Flujo de Trabajo
16.
Am J Hum Genet ; 107(4): 654-669, 2020 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-32937144

RESUMEN

There is growing recognition that epivariations, most often recognized as promoter hypermethylation events that lead to gene silencing, are associated with a number of human diseases. However, little information exists on the prevalence and distribution of rare epigenetic variation in the human population. In order to address this, we performed a survey of methylation profiles from 23,116 individuals using the Illumina 450k array. Using a robust outlier approach, we identified 4,452 unique autosomal epivariations, including potentially inactivating promoter methylation events at 384 genes linked to human disease. For example, we observed promoter hypermethylation of BRCA1 and LDLR at population frequencies of ∼1 in 3,000 and ∼1 in 6,000, respectively, suggesting that epivariations may underlie a fraction of human disease which would be missed by purely sequence-based approaches. Using expression data, we confirmed that many epivariations are associated with outlier gene expression. Analysis of variation data and monozygous twin pairs suggests that approximately two-thirds of epivariations segregate in the population secondary to underlying sequence mutations, while one-third are likely sporadic events that occur post-zygotically. We identified 25 loci where rare hypermethylation coincided with the presence of an unstable CGG tandem repeat, validated the presence of CGG expansions at several loci, and identified the putative molecular defect underlying most of the known folate-sensitive fragile sites in the genome. Our study provides a catalog of rare epigenetic changes in the human genome, gives insight into the underlying origins and consequences of epivariations, and identifies many hypermethylated CGG repeat expansions.


Asunto(s)
Proteína BRCA1/genética , Epigénesis Genética , Enfermedades Genéticas Congénitas/genética , Genoma Humano , Receptores de LDL/genética , Expansión de Repetición de Trinucleótido , Proteína BRCA1/metabolismo , Metilación de ADN , Femenino , Ácido Fólico/metabolismo , Silenciador del Gen , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/patología , Sitios Genéticos , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Regiones Promotoras Genéticas , Receptores de LDL/metabolismo , Gemelos Monocigóticos
18.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32541955

RESUMEN

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Asunto(s)
Mutación de Línea Germinal/genética , Mutación INDEL/genética , Diploidia , Variación Estructural del Genoma , Humanos , Anotación de Secuencia Molecular , Análisis de Secuencia de ADN
19.
Hum Mutat ; 41(4): 800-806, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-31898844

RESUMEN

The mechanisms underlying de novo insertion/deletion (indel) genesis, such as polymerase slippage, have been hypothesized but not well characterized in the human genome. We implemented two methodological improvements, which were leveraged to dissect indel mutagenesis. We assigned de novo variants to parent-of-origin (i.e., phasing) with low-coverage long-read whole-genome sequencing, achieving better phasing compared to short-read sequencing (medians of 84% and 23%, respectively). We then wrote an application programming interface to classify indels into three subtypes according to sequence context. Across three cohorts with different phasing methods (Ntrios = 540, all cohorts), we observed that one de novo indel subtype, change in copy count (CCC), was significantly correlated with father's (p = 7.1 × 10-4 ) but not mother's (p = .45) age at conception. We replicated this effect in three cohorts without de novo phasing (ppaternal = 1.9 × 10-9 , pmaternal = .61; Ntrios = 3,391, all cohorts). Although this is consistent with polymerase slippage during spermatogenesis, the percentage of variance explained by paternal age was low, and we did not observe an association with replication timing. These results suggest that spermatogenesis-specific events have a minor role in CCC indel mutagenesis, one not observed for other indel subtypes nor for maternal age in general. These results have implications for indel modeling in evolution and disease.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Genómica/métodos , Mutación INDEL , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...